Best Agglomerative Ranked Subset for Feature Selection

نویسندگان

  • Roberto Ruiz Sánchez
  • José Cristóbal Riquelme Santos
  • Jesús S. Aguilar-Ruiz
چکیده

The enormous increase of the size in databases makes finding an optimal subset of features extremely difficult. In this paper, a new feature selection method is proposed that will allow any subset evaluator -including the wrapper evaluation methodto be used to find a group of features that will allow a distinction to be made between the different possible classes. The method, BARS (Best Agglomerative Ranked Subset), is based on the idea of relevance and redundancy, in the sense that a ranked feature (or set) is more relevant if it adds information when it is included in the final subset of selected features. This heuristic method reduces dimensionality drastically and leads to improvements in the accuracy, in comparison to a complete set and as opposed to other feature selection algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection Applied to Data from the Sloan Digital Sky Survey

In recent years there has been an explosion in the rate of acquisition of astronomical data. The analysis of astronomical data presents unprecedented opportunities and challenges for data mining in tasks, such as clustering, object discovery and classification. In this work, we address the feature selection problem in classification of photometric and spectroscopic data collected from the SDSS ...

متن کامل

A New Hybrid Feature Subset Selection Algorithm for the Analysis of Ovarian Cancer Data Using Laser Mass Spectrum

Introduction: Amajor problem in the treatment of cancer is the lack of an appropriate method for the early diagnosis of the disease. The chemical reaction within an organ may be reflected in the form of proteomic patterns in the serum, sputum, or urine. Laser mass spectrometry is a valuable tool for extracting the proteomic patterns from biological samples. A major challenge in extracting such ...

متن کامل

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008